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Abstract 

We investigate questions related to the presence of primitive words and Lyndon 
words in automatic sequences. We show that the Lyndon factorization of a A;-automatic 
sequence is itself fc-automatic. We also show that the function counting the number 
of primitive factors (resp., Lyndon factors) of length n in a fc-automatic sequence is 
fe-regular. 

1 Introduction 

We start with some basic definitions. A nonempty word w is called a power if it can be 
written in the form w = x^, for some integer k > 2. Otherwise w is called primitive. Thus 
murmur is a power, but murder is primitive. A word ?/ is a factor of a word w if there exist 
words X, z such that w = xyz. If further x = e (resp., z = e), then y is a prefix (resp., suffix) 
of w. A prefix or suffix of a word w is called proper if it is unequal to w. 

Let S be an ordered alphabet. We recall the usual definition of lexicographic order on 
the words in E*. We write w < x if either 

(a) u; is a proper prefix of x; or 

(b) there exist words y, z, z' and letters a <h such that w = yaz and x = ybz'. 

For example, using the usual ordering of the alphabet, we have common < con < conjugate. 
As usual, we write w < x if w < x or w = x. 

A word w is a conjugate of a word x if there exist words u, v such that w = uv and w = vu. 
Thus, for example, enlist and listen are conjugates. A word is said to be Lyndon if it is 
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primitive and lexicographically least among all its conjugates. Thus, for example, academy 
is Lyndon, while googol and googoo are not. A classical theorem is that a finite word is 
Lyndon if and only if it is lexicographically less than each of its proper suffixes [7]. 

We now turn to (right-) infinite words. We write an infinite word in boldface, as x = 
aoO'iO'2 ■ ■ ■ and use indexing starting at 0. For i < j + 1, we let denote the set {i, i + 
1, . . . , j}. (If z = j + 1 we get the empty set.) We let x[z..j] denote the word ajfli+i ■ ■ -aj. 
Similarly, [i..oo] denotes the infinite set + 1,. . .} and x[z..oo] denotes the infinite word 

O'iO'i+l ■ ■ ■ • 

An infinite word or sequence x = 000102 • ■ ■ is said to be k-automatic if there is a 
deterministic finite automaton (with outputs associated with the states) that, on input n 
expressed in base fc, reaches a state q with output r(g) equal to a„. For more details, see [5] 
or [3]. In several previous papers [1, 4, 14, 16, 8], we have developed a technique to show 
that many properties of automatic sequences are decidable. The fundamental tool is the 
following: 

Theorem 1. Let P{n) he a predicate associated with a k-automatic sequence x, expressible 
using addition, subtraction, comparisons, logical operations, indexing into x, and existential 
and universal quantifiers. Then there is a computable finite automaton accepting the base-k 
representations of those n for which P{n) holds. Furthermore, we can decide if P[n) holds 
for at least one n, or for all n, or for infinitely many n. 

If a predicate is constructed as in the previous theorem, we just say it is "expressible". 
Any expressible predicate is decidable. As an example, we prove 

Theorem 2. Let :k be a k-automatic sequence. The predicate P{i,j) defined by "x[z..j] is 

primitive" is expressible. 

Proof, (due to Luke Schaeffer) It is easy to see that a word is a power if and only if it is equal 
to some cyclic shift of itself, other than the trivial shift. Thus a word is a power if and only if 
there is a c?, < d < j —i-\-l, such that x[i..j — d\ = x[i-\-d..j] and x[j — d-\-l..j] = x[i..i-\-d—l]. 
A word is primitive if there is no such d. □ 

Theorem 3. Let :k be a k-automatic sequence. The predicate LL{i, j,m,n) defined by 
'^[i..j] < x[m..n]" is expressible. 

Proof. We have x[z..j] < x.[m..n\ if and only if either 

(a) j — i < n — m and x[i..j] = x[m..m + j — i]', or 

(b) there exists t < min(j — i,n — m) such that x[i.i + t] = x[m..m + 1] and x[« + 1 + 1] < 
x[m + t + 1]. 

□ 

Theorem 4. Let ^ be a k-automatic sequence. The predicate L{i,j) defined by "x[i..j] is a 
Lyndon word" is expressible. 
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Proof. It suffices to check that x[i..j] is lexicographically less than each of its proper suffixes, 
that is, that LL{i, , j) holds for all i' with i <i' < j. □ 

We can extend the definition of lexicographic order to infinite words in the obvious 
way. We can extend the definition of Lyndon words to (right-) infinite words as follows: 
an infinite word x = 000102 • • • is Lyndon if it is lexicographically less than all its suffixes 
x[j..oo] = OjOj+i ■ ■ ■ for J > 1. Then we have the following theorems. 

Theorem 5. Lei x he a k-automatic sequence. The predicate LL^{i,j) defined by "x[z..oo] < 
x[j..oo] is expressible. 

Proof. This is equivalent to 3t > such that x[i..'i + t — 1] = x[j..j + t — 1] and x.[i + t]< 
x[j+t]. □ 

Theorem 6. Let ^ be a k-automatic sequence. The predicate Loo{i) defined by '^[i..oo] is 
an infinite Lyndon word" is expressible. 

Proof. This is equivalent to LLao{i,j) holding for all j > i. □ 



2 Lyndon factorization 

Siromoney et al. [12] proved that every infinite word x = 090102 ■ ■ ■ can be factorized uniquely 
in exactly one of the following two ways: 

(a) as x = W1W2W3 ■ ■ ■ where each Wi is a finite Lyndon word and wi > W2 > W3 ■ ■ ■ ; or 

(b) as X = WiW2W-s ■ ■ ■ Wr"w where Wi is a finite Lyndon word for 1 < i < r, and w is an 
infinite Lyndon word, and Wi > W2 > ■ ■ ■ > Wr > ■ 

If (a) holds we say that the Lyndon factorization of x is infinite; otherwise we say it is 
finite. 

Ido and Melancon [11, 10] gave an explicit description of the Lyndon factorization of the 
Thue-Morse word t and the period-doubling sequence (among other things). (Recall that 
the Thue-Morse word is given by t[?7,] = the number of I's in the binary expansion of n, 
taken modulo 2.) For the Thue-Morse word, this factorization is given by 

t = wiW2W3Wi ■■■ = (011)(01)(0011)(00101101) ■ ■ ■ , 

where each term in the factorization, after the ffist, is double the length of the previous. 
Seebold [15] and Cerny generalized these results to other related automatic sequences. 

In this section, generalizing the work of Ido, Melangon, Seebold, and Cerny, we prove 
that the Lyndon factorization of a fc-automatic sequence is itself fc-automatic. Of course, we 
need to explain how the factorization is encoded. The easiest and most natural way to do 
this is to use an infinite word over {0, 1}, where the I's indicate the positions where a new 
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term in the factorization begins. Thus the i'th 1, for i > 0, appears at index \w1W2 ■ ■ - Wil. 
For example, for the Thue-Morse word, this encoding is given by 

100101000100000001 • ■ • . 

If the factorization is infinite, then there are infinitely many I's in its encoding; otherwise 
there are finitely many I's. 

In order to prove the theorem, we need a number of results. We draw a distinction 
between a factor / of x (which is just a word) and an occurrence of that factor (which 
specifies the exact position at which / occurs). For example, in the Thue-Morse word t, the 
factor 0110 occurs as x[0..3] and x[11..15] and many other places. We call [0..3] and [11. .15], 
and so forth, the occurrences of 0110. An occurrence is said to be Lyndon if the word at that 
position is Lyndon. We say an occurrence Oi = is inside an occurrence O2 = if 
i' < i and j' > j. If, in addition, either i' < i ot j < j' (or both), then we say Oi is strictly 
inside O2. These definitions are easily extended to the case where j or j' are equal to 00, 
and they correspond to the predicates / (inside) and SI (strictly inside) given below: 

I{.hh^\3') is i' < i and ] > j 
SIii,j,i',j') is I{iJ,i'j') and {{i' < i) or (j' > j)) 

An infinite Lyndon factorization 

X = W1W2W3 ■ ■ ■ 

then corresponds to an infinite sequence of occurrences 

[«l--jl],[«2--j2],--- 

where w„ = x[z„..j„] and in+i = + 1 for n > 1, while a finite Lyndon factorization 

X = W1W2 ■ ■ ■ Wt-W 

corresponds to a finite sequence of occurrences 

h-jl], [«2--j2], • • • , [V-.jr], [ir+l-.Oo] 

where Wn = x[2„..j„] and in+i = j„ + 1 for 1 < n < r. 

Theorem 7. Let x be an infinite word. Every Lyndon occurrence in x appears inside a term 
of the Lyndon factorization of :k. 

Proof. We prove the result for infinite Lyndon factorizations; the result for finite factoriza- 
tions is exactly analogous. 

Suppose the factorization is x = wiW2Ws ■ ■ ■ ■ It suffices to show that no Lyndon occur- 
rence can span the boundary between two terms of the factorization. Suppose, contrary to 
what we want to prove, that uWiWi-^i ■ ■ ■ WjV is a Lyndon word for some u that is a nonempty 
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suffix of Wj„i (possibly equal to Wj-i), and v that is a nonempty prefix of Wj+i (possibly 
equal to Wj+i), and and « < j + 1. (If i = j + 1 then there are no Wj's at all between u and 

V.) 

Since u is a suffix of Wi-i and Wi-i is Lyndon, we have u > Wi^i. On the other hand, by 
the Lyndon factorization definition we have Wi-^i > Wi > ■ ■ ■ > wj > wj+i. But w is a prefix 
of Wj^i, so just by the definition of lexicographic ordering we have Wj^i > v. Putting this 
all together we get u > v. So ux > v for all words x. 

On the other hand, since uwi- ■■WjV is Lyndon, it must be lexicographically less than 
any proper suffix — for instance, v. So uWi---WjV < v. Take x = Wi---WjV to get a 
contradiction with the conclusion in the previous paragraph. □ 

Corollary 8. The occurrence [i.-j] corresponds to a term in the Lyndon factorization of :x. 
if and only if 

(a) [i..j] is Lyndon; and 

(b) [i-.j] does not occur strictly inside any other Lyndon occurrence. 

Proof. Suppose [i.-j] corresponds to a term Wn in the Lyndon factorization of x. Then 
evidently [i-.j] is Lyndon. If it occurred strictly inside some other Lyndon occurrence, say 
[i' ..j'], then we know from Theorem 7 that itself lies in inside some Wm, so [i-.j] must 

lie strictly inside Wm, which is clearly impossible. 

Now suppose [i-.j] is Lyndon and does not occur strictly inside any other Lyndon oc- 
currence. From Theorem 7 [i.-j] must occur inside some term of the factorization If 
[i..j] 7^ then [i..j] lies strictly inside a contradiction. So [i-.j] = and hence 

corresponds to a term of the factorization. □ 

Corollary 9. The predicate LF{i,j) defined by corresponds to a term of the Lyndon 

factorization of ^" is expressible. 

Proof. Indeed, by Corollary 8, the predicate LF(i,j) can be defined by 

L{i J) andW i',j' {SI{i J, i\j') =^ ^L{i',j')). 

□ 

We can now prove the main result of this section. 

Theorem 10. Using the encoding mentioned above, the Lyndon factorization of a k-automatic 
sequence is itself k-automatic. 

Proof. Using the technique of [1], we can create an automaton that on input i expressed in 
base k, guesses j and checks if LF{i,j) holds. If so, it outputs 1 and otherwise 0. To get the 
last i in the case that the Lyndon factorization is finite, we also accept i if Loo{i) holds. □ 

We also have 
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Theorem 11. Let yi he a k-automatic sequence. It is decidahle if the Lyndon factorization 
of :s. is finite or infinite. 



Proof. The construction given above in the proof of Theorem 10 produces an automaton that 
accepts finitely many distinct i (expressed in base k) if and only if the Lyndon factorization 
of X is finite. □ 

We programmed up our method and found the Lyndon factorization of the Thue-Morse 
sequence t, the period-doubling sequence d, the paperfolding sequence p, and the Rudin- 
Shapiro sequence r, and their negations. (The results for Thue-Morse and the period- 
doubling sequence were already given in [10], albeit in a different form.) Recall that the 
period-doubling sequence is defined by p[n] = |t[?T, + 1] — t[?7,]|. The paperfolding sequence 

p = 0010011 ■ ■ ■ arises from the limit of the sequence (/„), where /o = and fn+i = fn^fn , 
where R denotes reversal and x maps to 1 and 1 to 0. Finally, the Rudin- Shapiro sequence 
r is defined by r[n] = the number of (possibly overlapping) occurrences of 11 in the binary 
expansion of n, taken modulo 2. The results are given in the theorem below. 

Theorem 12. The occurrences corresponding to the Lyndon factorization of each word is 
as follows: 

• the Thue-Morse sequence t: [0..2], [3. .4], [5. .8], [9.. 16], [17.. 32], ...,[2' + 1..2'+i], . . .; 

• the negated Thue-Morse sequence t: [0..0], [I..00]; 

• the Rudm-Shapiro sequence v: [0..6], [7.. 14], [15. .30], . . . , [2* - 1..2*+^ - 2], . . .; 

• the negated Rudm-Shapiro sequencer: [0..0], [1..1], [2. .2], [3. .10], [11. .42], [43..46], [47.. 174], 
4«-i _ 4*-2 _ ]___4« _ 4i-i _ 2]^ [4« — 4«^i — i..4«+i — 4* — 4*'^ — 1], . . .; 

• the paperfolding sequence p.- [0..6], [7.. 14], [15. .30], . . . , [2* — 1..2*+-'^ — 2], . . .; 

• the negated paperfolding sequence^: [0..0], [1..1], [2. .4], [5. .9], [10.. 20], [21. .84], [85..340], . . . 
l)/3..4(4^-l)/3],...; 

• the penod- doubling sequence d: [0..0], [1..4], [5. .20], [21. .84], . . . , [(4'-l)/3..4(4'-l)/3], . . .; 

• the negated period-douhlmg sequenced: [0..1], [2. .9], [10..41], [42.. 169], . . . , [2(4^-l)/3..2(4* 
l)/3-l],.... 



3 Enumeration 

There is a useful generalization of fc-automatic sequences to sequences over N, the non- 
negative integers. A sequence (a„)„,>o over N is called /c-regular if there exist vectors u 
and V and a matrix-valued morphism n such that a„ = u^{w)v, where w is the base-/c 
representation of n. For more details, see [2]. 
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The subword complexity function p{n) of an infinite sequence x counts the number of 
distinct length-n factors of x. There are also many variations, such as counting the number 
of palindromic factors or unbordered factors. If x is /c-automatic, then all three of these are 
/c-regular sequences [1]. We now show that the same result holds for primitive or Lyndon 
factors. 

Theorem 13. The function counting the number of length-n primitive (resp., Lyndon) fac- 
tors of a k-automatic sequence x is k-regular. 

Proof. By the results of [4], it suffices to show that there is an automaton accepting the 
hase-k representations of pairs (n, i) such that the number of i's associated with each n 
equals the number of primitive (resp., Lyndon) factors of length n. 

To do so, it suffices to show that the predicate P(n, i) defined by "the factor of length n 
beginning at position i is primitive (resp., Lyndon) and is the first occurrence of that factor 
in x" is expressible. This is just 



(resp.. 



P(i,i + n — 1) and Vj < i x[i..i + n — 1] ^ x[j..j + n — 1], 
L{i,i + n — 1) and Vj < i x[i..i + n — 1] ^ + n — 1]). 



□ 



We used our method to compute these sequences for the Thue-Morse sequence, and the 
results are given below. 

Theorem 14. Let p^{n) denote the number of Lyndon factors of length n of the Thue-Morse 
sequence. Then 

{1, ifn = 2^orh- 2^ for > 1 ; 
2, if n = 1 or n = 5 or n = 3 ■ 2^ for k > 0; 
0, otherwise. 

Theorem 15. Let p^{n) denote the number of primitive factors of length n of the Thue- 
Morse sequence. Then 



'3 -2* -4, ifn = 2^; 

4n - 2* - 4, z/ 2* + 1 < n< 3 ■ 2*-^; 

5 -2* -6, z/n = 3-2*-i; 
^2n + 2*+^-2, 3 ■ 2*-i < n < 2*+^ 



We can also state a similar result for the Rudin-Shapiro sequence. 

Theorem 16. Let p^{n) denote the number of Lyndon factors of length of the Rudin-Shapiro 
sequence. Then p^{n) < 8 for all n. This sequence is 2-automatic and there is an automaton 
of 2444 states that generates it. 
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Proof. The proof was carried out by machine computation, and we briefly summarize how 
it was done. 

First, we created an automaton A to accept all pairs of integers (n, i), represented in base 
2, such that the factor of length n in r, starting at position i, is a Lyndon factor, and is the 
first occurrence of that factor in r. Thus, the number of distinct integers i associated with 
each n is Pr{n). The automaton A has 102 states. 

Using the techniques in [4], we then used A to create matrices Mq and Mi of dimension 
102 X 102, and vectors v,w such that vM^w = p^in), if x is the base-2 representation of n. 
Here if x = 0102 ■ ■ ■ a,, then by we mean the product MajM^g ■ ■ ■ M^-. 

From this we then created a new automaton A' where the states are products of the form 
vMx for binary strings x and the transitions are on and 1. This automaton was built using 
a breadth-first approach, using a queue to hold states whose targets on and 1 are not yet 
known. Of course, this was not guaranteed to terminate, because a priori we did not know 
that Pr{n) is bounded. But the procedure did terminate at 2444 states, and the product of 
the vMx corresponding to each state with w gives an integer less than or equal to 8, thus 
proving the desired result and also providing an automaton to compute p^{n). □ 

4 Finite factorizations 

Of course, the original Lyndon factorization was for finite words: every finite nonempty word 
X can be factored uniquely as a nonincreasing product w iw 2 • ■ ■ Wm of Lyndon words. We can 
apply this theorem to all prefixes of a /c-automatic sequence. It is then natural to wonder 
if a single automaton can encode all the Lyndon factorizations of all finite prefixes. The 
answer is yes, as the following result shows. 

Theorem 17. Suppose x a k-automatic sequence. Then there is an automaton A accepting 

{{n,i)k '■ the Lyndon factorization ofx.[0..n — 1] is W1W2 ■ ■ -Wm with Wm = x[z..n — 1]}. 

Proof. As is well-known [7], if W1W2 ■ ■ -Wm is the Lyndon factorization of x, then Wm is the 
lexicographically least suffix of x. So to accept {n,i)k we find t such that x.[i..n — 1] < 
x[j..n — 1] for < j < n and i ^ j- □ 

Given A, we can find the complete factorization of any prefix x[0..n — 1] by using this 
automaton to find the appropriate / (as described in [9]) and then replacing n with /. 

We carried out this construction for the Thue-Morse sequence, and the result is shown 
below in Figure 4. 
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0,1] \[0,1]\[0,1] 



[1,1] /[0,0] 



Figure 1: A finite automaton accepting tlie base-2 representation of {n,i) sucli tliat tlie 
Lyndon factorization of t[0..n — 1] ends in tfie term t[i..n — 1] 



In a similar manner, tliere is an automaton tliat encodes tlie factorization of every factor 
of a /c-automatic sequence: 

Theorem 18. Suppose is a k-automatic sequence. Then there is an automaton A' accept- 
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ing 

{{hj^ Ofc • Lyndon factorization of x.[i..j — 1] is W1W2 ■ ■ -Wm with Wm = x[/..n — 1]}. 

We calculated A' for the Thue-Morse sequence using our method. It is a 34-state machine 
and is displayed in Figure 4. 




Figure 2: A finite automaton accepting the base-2 representation of such that the 

Lyndon factorization of t[i..j — 1] ends in the term t[l..j — 1] 

Another quantity of interest is the number of terms in the Lyndon factorization of each 
prefix. 

Theorem 19. Let x be a k-automatic sequence. Then the sequence (/(n))„>o defined by 
f{n) = the number of terms in the Lyndon factorization o/x[0..n] 

is k-regular. 

Proof. We construct an automaton to accept 

{{n,i) : 3j < n such that L{i,j) and if SI{i, j,i' , j') and < i' < j' < n then -iL{i' 

□ 
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For the Thue-Morse sequence the corresponding sequence satisfies the relations 



/(4n + 1) = 


-/(2n) 


1 £ ( c\ 

+ /(2n - 


T \ 1 £ { A \ 

h 1) + /(4n) 


f{8n + 2) = 




1 £ { A \ 


+ /(4n + 2) 


f{Sn + 3) = 


-/(2n) 


+ /(4n) 


1 £ f A 1 O \ 

+ /(4n + 3) 


/(8n + 6) = 


-/(2n) 


- /(4n - 


F 2) + 3/(4n + 3) 


/(8n + 7) = 


-/(2ri) 


+ 2/(4r2 


+ 3) 


/(16n) = 


-/(2ri) 


+ /(4n) 


+ /(8n) 


/(16n + 4) = 


-/(2n) 


+ /(4n) 


+ /(8n + 4) 


/(16n + 8) = 


-/(2n) 


+ /(4n- 


h3) + /(8n + 4) 


/(16n + 12) = 


-/(2ri) 


- 2/(4n 


+ 2) + 3/(4r2 + 3) + /(8n + 4) 



for n > 1, which allows efficient calculation of this quantity. 
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